AITopics | individual reward

Country: Asia > China > Anhui Province > Hefei (0.04)

Genre:

Research Report > New Finding (0.93)
Overview (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsFeb-11-2026, 16:21:23 GMT

AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making Yizhe Huang 2,1 Xingbo Wang 2 Hao Liu 3 Fanqi Kong 2,1

Traditional interactive environments limit agents' intelligence growth with fixed

artificial intelligence, deep learning, machine learning, (16 more...)

Country:

Europe > Sweden > Skåne County > Malmö (0.04)
North America > United States > Montana (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (0.46)

Industry:

Education (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Neural Information Processing SystemsFeb-8-2026, 19:45:11 GMT

LearningtoSimulateSelf-DrivenParticlesSystem withCoordinatedPolicyOptimization

InaSDP system, each agentpursues itsowngoalandconstantly changes itscooperativeorcompetitive behaviors with its nearby agents.

artificial intelligence, arxivpreprintarxiv, machine learning, (18 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 11:03:47 GMT

edac78c3e300629acfe6cbe9ca88fb84-Paper-Conference.pdf

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country: Asia > China > Anhui Province > Hefei (0.04)

Genre:

Research Report > New Finding (0.93)
Overview (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Kim, Woojun, Sycara, Katia

Fair Cooperation in Mixed-Motive Games via Conflict-Aware Gradient Adjustment

arXiv.org Artificial IntelligenceAug-26-2025

Multi-agent reinforcement learning in mixed-motive settings presents a fundamental challenge: agents must balance individual interests with collective goals, which are neither fully aligned nor strictly opposed. To address this, reward restructuring methods such as gifting and intrinsic motivation have been proposed. However, these approaches primarily focus on promoting cooperation by managing the trade-off between individual and collective returns, without explicitly addressing fairness with respect to the agents' task-specific rewards. In this paper, we propose an adaptive conflict-aware gradient adjustment method that promotes cooperation while ensuring fairness in individual rewards. The proposed method dynamically balances policy gradients derived from individual and collective objectives in situations where the two objectives are in conflict. By explicitly resolving such conflicts, our method improves collective performance while preserving fairness across agents. We provide theoretical results that guarantee monotonic non-decreasing improvement in both the collective and individual objectives and ensure fairness. Empirical results in sequential social dilemma environments demonstrate that our approach outperforms baselines in terms of social welfare while ensuring fairness among agents.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2508.17696

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.46)
Social Sector (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Neural Information Processing SystemsAug-14-2025, 20:36:07 GMT

Appendix A Pseudocode of DRE-MARL

The pseudocode for DRE-MARL training is shown in Algorithm 20, which takes the following steps. The property of the received reward in this environment is set to be collaborative. It is a scenario with two agents and three landmarks. Navigation and Reference is that the target landmark of each agent is only known to its partner. We use the abbreviation REF to denote this environment.

dre-marl, reward aggregation, reward uncertainty, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)

arXiv.org Artificial IntelligenceFeb-5-2025

Speaking the Language of Teamwork: LLM-Guided Credit Assignment in Multi-Agent Reinforcement Learning

Lin, Muhan, Shi, Shuyang, Guo, Yue, Tadiparthi, Vaishnav, Chalaki, Behdad, Pari, Ehsan Moradi, Stepputtis, Simon, Kim, Woojun, Campbell, Joseph, Sycara, Katia

Credit assignment, the process of attributing credit or blame to individual agents for their contributions to a team's success or failure, remains a fundamental challenge in multi-agent reinforcement learning (MARL), particularly in environments with sparse rewards. Commonly-used approaches such as value decomposition often lead to suboptimal policies in these settings, and designing dense reward functions that align with human intuition can be complex and labor-intensive. In this work, we propose a novel framework where a large language model (LLM) generates dense, agent-specific rewards based on a natural language description of the task and the overall team goal. By learning a potential-based reward function over multiple queries, our method reduces the impact of ranking errors while allowing the LLM to evaluate each agent's contribution to the overall task. Through extensive experiments, we demonstrate that our approach achieves faster convergence and higher policy returns compared to state-of-the-art MARL baselines.

large language model, machine learning, reinforcement learning, (15 more...)

2502.03723

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Varela, Guilherme S., Sardinha, Alberto, Melo, Francisco S.

Networked Agents in the Dark: Team Value Learning under Partial Observability

arXiv.org Artificial IntelligenceJan-15-2025

We propose a novel cooperative multi-agent reinforcement learning (MARL) approach for networked agents. In contrast to previous methods that rely on complete state information or joint observations, our agents must learn how to reach shared objectives under partial observability. During training, they collect individual rewards and approximate a team value function through local communication, resulting in cooperative behavior. To describe our problem, we introduce the networked dynamic partially observable Markov game framework, where agents communicate over a switching topology communication network. Our distributed method, DNA-MARL, uses a consensus mechanism for local communication and gradient descent for local computation. DNA-MARL increases the range of the possible applications of networked agents, being well-suited for real world domains that impose privacy and where the messages may not reach their recipients. We evaluate DNA-MARL across benchmark MARL scenarios. Our results highlight the superior performance of DNA-MARL over previous methods.

agent, algorithm, consensus, (16 more...)

2501.08778

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Michigan > Wayne County > Detroit (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Jin, Yue, Wei, Shuangqing, Montana, Giovanni

Achieving Collective Welfare in Multi-Agent Reinforcement Learning via Suggestion Sharing

arXiv.org Artificial IntelligenceDec-16-2024

In human society, the conflict between self-interest and collective well-being often obstructs efforts to achieve shared welfare. Related concepts like the Tragedy of the Commons and Social Dilemmas frequently manifest in our daily lives. As artificial agents increasingly serve as autonomous proxies for humans, we propose using multi-agent reinforcement learning (MARL) to address this issue - learning policies to maximise collective returns even when individual agents' interests conflict with the collective one. Traditional MARL solutions involve sharing rewards, values, and policies or designing intrinsic rewards to encourage agents to learn collectively optimal policies. We introduce a novel MARL approach based on Suggestion Sharing (SS), where agents exchange only action suggestions. This method enables effective cooperation without the need to design intrinsic rewards, achieving strong performance while revealing less private information compared to sharing rewards, values, or policies. Our theoretical analysis establishes a bound on the discrepancy between collective and individual objectives, demonstrating how sharing suggestions can align agents' behaviours with the collective objective. Experimental results demonstrate that SS performs competitively with baselines that rely on value or policy sharing or intrinsic rewards.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2412.12326

Country:

North America > United States > Montana (0.04)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Findik, Yasin, Hasenfus, Hunter, Azadeh, Reza

Collaborative Adaptation for Recovery from Unforeseen Malfunctions in Discrete and Continuous MARL Domains

arXiv.org Artificial IntelligenceJul-26-2024

Cooperative multi-agent learning plays a crucial role for developing effective strategies to achieve individual or shared objectives in multi-agent teams. In real-world settings, agents may face unexpected failures, such as a robot's leg malfunctioning or a teammate's battery running out. These malfunctions decrease the team's ability to accomplish assigned task(s), especially if they occur after the learning algorithms have already converged onto a collaborative strategy. Current leading approaches in Multi-Agent Reinforcement Learning (MARL) often recover slowly -- if at all -- from such malfunctions. To overcome this limitation, we present the Collaborative Adaptation (CA) framework, highlighting its unique capability to operate in both continuous and discrete domains. Our framework enhances the adaptability of agents to unexpected failures by integrating inter-agent relationships into their learning processes, thereby accelerating the recovery from malfunctions. We evaluated our framework's performance through experiments in both discrete and continuous environments. Empirical results reveal that in scenarios involving unforeseen malfunction, although state-of-the-art algorithms often converge on sub-optimal solutions, the proposed CA framework mitigates and recovers more effectively.

agent, algorithm, malfunction, (14 more...)

2407.19144

Country: North America > United States > Massachusetts > Middlesex County > Lowell (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)